Unsupervised Text Recap Extraction for TV Series
نویسندگان
چکیده
Sequences found at the beginning of TV shows help the audience absorb the essence of previous episodes, and grab their attention with upcoming plots. In this paper, we propose a novel task, text recap extraction. Compared with conventional summarization, text recap extraction captures the duality of summarization and plot contingency between adjacent episodes. We present a new dataset, TVRecap, for text recap extraction on TV shows. We propose an unsupervised model that identifies text recaps based on plot descriptions. We introduce two contingency factors, concept coverage and sparse reconstruction, that encourage recaps to prompt the upcoming story development. We also propose a multi-view extension of our model which can incorporate dialogues and synopses. We conduct extensive experiments on TVRecap, and conclude that our model outperforms summarization approaches.
منابع مشابه
Unsupervised method for the acquisition of general language paraphrases for medical compounds
Medical information is widespread in modern society (e.g. scientific research, medical blogs, clinical documents, TV and radio broadcast, novels). Moreover, everybody’s life may be concerned with medical problems. However, the medical field conveys very specific and often opaque notions (e.g., myocardial infarction, cholecystectomy, abdominal strangulated hernia, galactose urine), that are diff...
متن کاملStructural Linguistics and Unsupervised Information Extraction
A precondition for extracting information from large text corpora is discovering the information structures underlying the text. Progress in this direction is being made in the form of unsupervised information extraction (IE). We describe recent work in unsupervised relation extraction and compare its goals to those of grammar discovery for science sublanguages. We consider what this work on gr...
متن کاملUPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task
This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recogniti...
متن کاملMultilingual Artificial Text Extraction and Script Identification from Video Images
This work presents a system for extraction and script identification of multilingual artificial text appearing in video images. As opposed to most of the existing text extraction systems which target textual occurrences in a particular script or language, we have proposed a generic multilingual text extraction system that relies on a combination of unsupervised and supervised techniques. The un...
متن کاملAn Overview of Open Information Extraction∗
Open Information Extraction (OIE) is a recent unsupervised strategy to extract great amounts of basic propositions (verb-based triples) from massive text corpora which scales to Web-size document collections. We will intoduce the main properties of this extraction method. 1998 ACM Subject Classification Dummy classification – please refer to http://www.acm.org/ about/class/ccs98-html
متن کامل